Section 2 describes routines within Polymetrics used for similarity searches. Similarity search can be used for
Presently similarity search within Polymetrics uses the kNN method with a predefined tolerance for identifying similar products. The data is visualized in an interactive heatmap and can also be obtained in a tabulated form.
In addition to Polymetrics, the following libraries are imported:
import pandas as pd
import Polymetrics as poly
import FileImport
import polymetrics_config
from IPython.display import display
The data used in the example below is compiled using the Material Search option in UL Prospector (https://www.ulprospector.com/).
A few PE resins from the following manufacturers were included in the analysis.
The data was added to the excel file - 'Example_Dataset.xlsx' with the project name 'kNN', and is imported using the FileImport module in Polymetrics.
The XLSXImport function imports the data and conditions some datatypes for easier processing. A short preview of the dataset is shown below.
df_in = FileImport.XLSXImport("Article/Example_Dataset.xlsx", sheet_name = 'Data')
result = df_in[(df_in['Project'] == 'kNN')]
df_kNN = result.dropna(axis = 1)
display(df_kNN.tail(5))
| Identifier | Name | UID | Project | Type | Density | Tm | I2 | |
|---|---|---|---|---|---|---|---|---|
| 75 | DOWLEX_GM_8091 | DOWLEX™ GM 8091 | 687A | kNN | Resin_Commercial | 0.918 | 111.0 | 1.0 |
| 76 | DOWLEX_NG_5045P | DOWLEX™ NG 5045P | 6DYZ | kNN | Resin_Commercial | 0.917 | 118.0 | 0.8 |
| 77 | ELITE_AT_6111 | ELITE™ AT 6111 | 6EXO | kNN | Resin_Commercial | 0.912 | 109.0 | 3.7 |
| 78 | Marlex_5428 | Marlex® 5428 | 6K0I | kNN | Resin_Commercial | 0.930 | 111.0 | 2.2 |
| 79 | Marlex_5430 | Marlex® 5430 | 6LYJ | kNN | Resin_Commercial | 0.925 | 111.0 | 2.2 |
simiarity_matrix function takes DataFrame as an input, scales the data, and uses the kNN method to calculate distances between samples. A cut-off is determined by selecting a certain quantile of the pairwise distance data as a threshold. For example, Q = 0.25 would suggest that the cut-off value is the 1st quartile of the pairwise distance data. The pairs with distances lower than the threshold value are selected for further analysis.
In the example below, the similarity between PE resins is determined by density, MI, and peak melting temperatures as reported in UL Prospector. Only the first quartile of the data is considered for further evaluation. The result is plotted in the form of an interactive heatmap.
indices, distances = poly.similarity_matrix(df_kNN, label = 'UID', Q = 0.25, plot = True)
#Function signature
# label = label on the heatmap.
# Q = quantile value.
# plot = boolean, turn on/off heatmap.
find_similar function uses the outputs of similarity_matrix and returns a list of polymers similar to the query polymer arranged by distances in ascending order.
display(poly.find_similar('3A5L', df_kNN, indices, distances, label = 'UID'))
| Identifier | Name | UID | Project | Type | Density | Tm | I2 | Distances | |
|---|---|---|---|---|---|---|---|---|---|
| 41 | Marlex_5626 | Marlex® 5626 | 3A5L | kNN | Resin_Commercial | 0.922 | 114.0 | 0.65 | 0.000000 |
| 74 | Marlex_5754 | Marlex® 5754 | 5YIM | kNN | Resin_Commercial | 0.925 | 112.0 | 0.80 | 0.079563 |
| 31 | SUPEER_8118 | SUPEER™ 8118 | 2PW6 | kNN | Resin_Commercial | 0.918 | 115.0 | 1.10 | 0.109134 |
| 32 | SUPEER_8118L | SUPEER™ 8118L | 2R3Z | kNN | Resin_Commercial | 0.918 | 115.0 | 1.10 | 0.109134 |
| 71 | DOWLEX_2688G | DOWLEX™ 2688G | 5PN2 | kNN | Resin_Commercial | 0.917 | 117.0 | 0.50 | 0.120966 |
| 75 | DOWLEX_GM_8091 | DOWLEX™ GM 8091 | 687A | kNN | Resin_Commercial | 0.918 | 111.0 | 1.00 | 0.124252 |
| 73 | DOWLEX_GM_8071G | DOWLEX™ GM 8071G | 5WDK | kNN | Resin_Commercial | 0.920 | 118.0 | 0.90 | 0.124591 |
| 48 | ELITE_AT_6501 | ELITE™ AT 6501 | 3I4Y | kNN | Resin_Commercial | 0.914 | 115.0 | 0.85 | 0.140967 |
| 76 | DOWLEX_NG_5045P | DOWLEX™ NG 5045P | 6DYZ | kNN | Resin_Commercial | 0.917 | 118.0 | 0.80 | 0.141541 |
| 55 | DOWLEX_2045G_CIR | DOWLEX™ 2045G CIR | 42XQ | kNN | Resin_Commercial | 0.920 | 119.0 | 1.00 | 0.156368 |
| 56 | DOWLEX_2045LC | DOWLEX™ 2045LC | 4CD2 | kNN | Resin_Commercial | 0.920 | 119.0 | 1.00 | 0.156368 |
| 57 | DOWLEX_2645.11S | DOWLEX™ 2645.11S | 4DIP | kNN | Resin_Commercial | 0.921 | 120.0 | 0.90 | 0.173556 |
| 63 | DOWLEX_HMS_8017 | DOWLEX™ HMS 8017 | 4SQL | kNN | Resin_Commercial | 0.918 | 121.0 | 0.75 | 0.206358 |
| 53 | DOWLEX_2256G | DOWLEX™ 2256G | 3U15 | kNN | Resin_Commercial | 0.920 | 121.0 | 1.00 | 0.207290 |
| 67 | DOWLEX_GM_8070G | DOWLEX™ GM 8070G | 572N | kNN | Resin_Commercial | 0.917 | 121.0 | 0.90 | 0.216377 |
| 52 | DOWLEX_2045.11S | DOWLEX™ 2045.11S | 3NX3 | kNN | Resin_Commercial | 0.922 | 122.0 | 1.00 | 0.231154 |
| 51 | DOWLEX_2045.11G | DOWLEX™ 2045.11G | 3NK6 | kNN | Resin_Commercial | 0.922 | 122.0 | 1.00 | 0.231154 |
| 46 | DOWLEX_2045 | DOWLEX™ 2045 | 3EPY | kNN | Resin_Commercial | 0.920 | 122.0 | 1.00 | 0.233545 |
| 50 | ELITE_AT_6410 | ELITE™ AT 6410 | 3MSN | kNN | Resin_Commercial | 0.912 | 108.0 | 0.85 | 0.238491 |
| 49 | DOWLEX_2601A | DOWLEX™ 2601A | 3KE1 | kNN | Resin_Commercial | 0.924 | 122.0 | 1.30 | 0.253891 |
| 47 | DOWLEX_2070G | DOWLEX™ 2070G | 3HBQ | kNN | Resin_Commercial | 0.922 | 123.0 | 1.00 | 0.257972 |
| 62 | ELITE_5110G | ELITE™ 5110G | 4S5N | kNN | Resin_Commercial | 0.926 | 123.0 | 0.85 | 0.261279 |